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Abstract 


We develop a new bidirectional algorithm for estimating Markov chain multi-step 
transition probabilities: given a Markov chain, we want to estimate the proba¬ 
bility of hitting a given target state in £ steps after starting from a given source 
distribution. Given the target state t, we use a (reverse) local power iteration to 
construct an ‘expanded target distribution’, which has the same mean as the quan¬ 
tity we want to estimate, but a smaller variance - this can then be sampled effi¬ 
ciently by a Monte Carlo algorithm. Our method extends to any Markov chain on 
a discrete (finite or countable) state-space, and can be extended to compute func¬ 
tions of multi-step transition probabilities such as PageRank, graph diffusions, hit- 
ting/retum times, etc. Our main result is that in ‘sparse’ Markov Chains - wherein 
the number of transitions between states is comparable to the number of states - 
the running time of our algorithm for a uniform-random target node is order-wise 
smaller than Monte Carlo and power iteration based algorithms; in particular, our 
method can estimate a probability p using only 0(1/ y/p) running time. 

1 Introduction 

Markov chains are one of the workhorses of stochastic modeling, finding use across a variety of 
applications - MCMC algorithms for simulation and statistical inference; to compute network cen¬ 
trality metrics for data mining applications; statistical physics; operations management models for 
reliability, inventory and supply chains, etc. In this paper, we consider a fundamental problem as¬ 
sociated with Markov chains, which we refer to as the multi-step transition probability estimation 
(or MSTP-estimation) problem: given a Markov Chain on state space S with transition matrix P, 
an initial source distribution a over S, a target state t £ S and a fixed length £, we are interested in 
computing the £-step transition probability from a to t. Formally, we want to estimate: 



(1) 


where e t is the indicator vector of state i. A natural parametrization for the complexity of MSTP- 
estimation is in terms of the minimum transition probabilities we want to detect: given a desired 
minimum detection threshold <5, we want algorithms that give estimates which guarantee small rela¬ 
tive error for any (a, t, £) such that p e a [ t] > 6. 

Parametrizing in terms of the minimum detection threshold S can be thought of as benchmarking 
against a standard Monte Carlo algorithm, which estimates p e a [t] by sampling independent /'-step 
paths starting from states sampled from a. An alternate technique for MSTP-estimation is based on 
linear algebraic iterations, in particular, the (local) power iteration. We discuss these in more detail 
in Section 1.2. Crucially, however, both these techniques have a running time oftt(l/5) for testing 
if p£[f] > 5 (cf. Section 1.2). 
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1.1 Our Results 


To the best of our knowledge, our work gives the first bidirectional algorithm for MSTP-estimation 
which works for general discrete state-space Markov chains 1 . The algorithm we develop is very 
simple, both in terms of implementation and analysis. Moreover, we prove that in many settings, it 
is order-wise faster than existing techniques. 

Our algorithm consists of two distinct/orwara? and reverse components, which are executed sequen¬ 
tially. In brief, the two components proceed as follows: 

• Reverse-work: Starting from the target node t, we perform a sequence of reverse local power 
iterations - in particular, we use the REVERSE-PUSH operation defined in Algorithm 1. 

• Forward-work: We next sample a number of random walks of length £, starting from a and 
transitioning according to P, and return the sum of residues on the walk as an estimate of p£[t]. 

This full algorithm, which we refer to as the Bidirectional-MSTP estimator, is formalized in 
Algorithm 2. It works for all countable-state Markov chains, giving the following accuracy result: 

Theorem 1 (For details, refer Section 2.3). Given any Markov chain P, source distribution cr, 
terminal state t, length (, threshold 8 and relative error e, Bidirectional-MSTP (Algorithm 2) 
returns an unbiased estimate p e a [£] for p e a [f], which, with high probability, satisfies: 

I Pi M - Pi [t] | < max {ep* [t],S} . 

Since we dynamically adjust the number of REVERSE-PUSH operations to ensure that all residues 
are small, the proof of the above theorem follows from straightforward concentration bounds. 

Since Bidirectional-MSTP combines local power iteration and Monte Carlo techniques, a nat¬ 
ural question is when the algorithm is faster than both. It is easy to to construct scenarios where the 
runtime of Bidirectional-MSTP is comparable to its two constituent algorithms - for example, 
if t has more than 1/8 in-neighbors. Surprisingly, however, we show that in sparse Markov chains 
and for typical target states, Bidirectional-MSTP is order-wise faster: 

Theorem 2 (For details, refer Section 2.3). Given any Markov chain P, source distribution a, 
length i, threshold 8 and desired accuracy e; then for a uniform random choice of t £ S, the 

Bidirectional-MSTP algorithm has a running time of 0(f 3 ^ 2 \Jd/8), where d is the average 
number of neighbors of nodes in S. 

Thus, for typical targets, we can estimate transition probabilities of order 8 in time only 0(l/s/8). 
Note that we do not need for every state that the number of neighboring states is small, but rather, 
that they are small on average - for example, this is true in ‘power-law’ networks, where some 
nodes have very high degree, but the average degree is small. The proof of this result is based on a 
modification of an argument in [2] - refer Section 2.3 for details. 

Estimating transition probabilities to a target state is one of the fundamental primitives in Markov 
chain models - hence, we believe that our algorithm can prove useful in a variety of application 
domains. In Section 3, we briefly describe how to adapt our method for some of these applica¬ 
tions - estimating hitting/return times and stationary probabilities, extensions to non-homogenous 
Markov chains (in particular, for estimating graph diffusions and heat kernels), connections to lo¬ 
cal algorithms and expansion testing. In addition, our MSTP-estimator could be useful in several 
other applications - estimating ruin probabilities in reliability models, buffer overflows in queueing 
systems, in statistical physics simulations, etc. 

1.2 Existing Approaches for MSTP-Estimation 

There are two main techniques used for MSTP-estimation. The first is a natural Monte Carlo al¬ 
gorithm: we estimate p e a [t\ by sampling independent f-step paths, each starting from a random 
state sampled from a. A simple concentration argument shows that for a given value of 8, we need 
@(1/(5) samples to get an accurate estimate of p£[f], irrespective of the choice of t, and the structure 

1 Bidirectional estimators have been developed before for reversible Markov chains [1]; our method however 

is not only more general, but conceptually and operationally simpler than these techniques (cf. Section 1 .2). 
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of P. Note that this algorithm is agnostic of the terminal state t; it gives an accurate estimate for any 
t such that p f a [f] > 5. 

On the other hand, the problem also admits a natural linear algebraic solution, using the standard 
power iteration starting with <r, or the reverse power iteration starting with e t (which is obtained 
by re-writing Equation (1) as p e a [t] := a(e t (P T Y) T ). When the state space is large, performing a 
direct power iteration is infeasible - however, there are localized versions of the power iteration that 
are still efficient. Such algorithms have been developed, among other applications, for PageRank 
estimation [3, 4] and for heat kernel estimation [5]. Although slow in the worst case 2 , such local 
update algorithms are often fast in practice, as unlike Monte Carlo methods they exploit the local 
structure of the chain. However even in sparse Markov chains and for a large fraction of target 
states, their running time can be 17(1/5). For example, consider a random walk on a random d- 
regular graph and let 6 = o(l/n) - then for £ ~ log d (l/5), verifying [t] > 6 is equivalent to 
uncovering the entire log d (l/5) neighborhood of s. Since a large random fi- regular graph is (whp) 
an expander, this neighborhood has 17(1/5) distinct nodes. Finally, note that as with Monte Carlo, 
power iterations can be adapted to either the source or terminal state, but not both. 

For reversible Markov chains, one can get a bidirectional algorithms for estimating p/ \t] based on 
colliding random walks. For example, consider the problem of estimating length-2f random walk 
transition probabilities in a regular undirected graph G(V. E ) on n vertices [1, 6], The main idea 
is that to test if a random walk goes from s to t in 2£ steps with probability > 5, we can generate 
two independent random walks of length £, starting from s and t respectively, and detect if they 
terminate at the same intermediate node. Suppose p w ,q w are the probabilities that a length-f walk 
from s and t respectively terminate at node w - then from the reversibility of the chain, we have that 
p 2 J[t\ = PwQw\ this is also the collision probability. The critical observation is that if we 

generate -y/1/5 walks from s and /, then we get 1/5 potential collisions, which is sufficient to detect 
if p(/[f] > 5. This argument forms the basis of the birthday-paradox, and similar techniques used 
in a variety of estimation problems (eg., see [7]). Showing concentration for this estimator is tricky 
as the samples are not independent; moreover, to control the variance of the samples, the algorithms 
often need to separately deal with ‘heavy’ intermediate nodes, where p w or q w are much larger than 
0(l/n). Our proposed approach is much simpler both in terms of algorithm and analysis, and more 
significantly, it extends beyond reversible chains to any general discrete state-space Markov chain. 

The most similar approach to ours is the recent FAST-PPR algorithm of Lofgren et al. [2] for PageR¬ 
ank estimation; our algorithm borrows several ideas and techniques from that work. However, the 
FAST-PPR algorithm relies heavily on the structure of PageRank - in particular, the fact that the 
PageRank walk has Geometric(a) length (and hence can be stopped and restarted due to the mem¬ 
oryless property). Our work provides an elegant and powerful generalization of the FAST-PPR 
algorithm, extending the approach to general Markov chains. 

2 The Bidirectional MSTP-estimation Algorithm 

2.1 Algorithm 

As described in Section 1.1, given a target state t, our bidirectional MSTP algorithm keeps track of 
a pair of vectors - the estimate vector q/ £ 1Z" and the residual vector r/ £ 1Z 71 - for each length 
k £ {0,1, 2,..., f\. The vectors are initially all set to 0 (i.e., the all-0 vector), except r/ which is 
initialized as e t . Moreover, they are updated using a reverse push operation defined as: 


Algorithm 1 REVERSE-PUSH(u, i) 


i +1 


Inputs: Transition matrix P, estimate vector q), residual vectors r(, r: 

1: return New estimate vectors {q(} and residual-vectors { r \} computed as: 


q; <- q* 


; §--u ) fh;) 


A - (r?,e^)e^; 


ri+1 


£- r: 


i+1 


+ (e v P T ) 


2 In particular, local power iterations are slow if a state has a very large out-neighborhood (for the forward 
iteration) or in-neighborhood (for the reverse update). 


3 






The main observation behind our algorithm is that we can re-write p e a [t\ in terms of {q{. r{} as an 
expectation over random sample-paths of the Markov chain as follows (cf. Equation (3)): 

t 

p i[t] = <*, qf)+E H~ k m] (2) 

k =0 

In other words, given vectors {q{ - , r^}, we can get an unbiased estimator for p e a [t] by sampling a 
length-^ random trajectory {Vo, Vj,.... Vp } of the Markov chain P starting at a random state Vo 
sampled from the source distribution er, and then adding the residuals along the trajectory as in 
Equation (2). We formalize this bidirectional MSTP algorithm in Algorithm 2. 


Algorithm 2 Bidirectional-MSTP(P, a. t. / ; max , S) 

Inputs: Transition matrix P , source distribution <r, target state t, maximum steps /j nax , minimum 
probability threshold 5, relative error bound e, failure probability pf 
1: Set accuracy parameter c based on e and pf and set reverse threshold S r (cf. Theorems 1 and 2) 
(in our experiments we use c = 7 and S r = y/Sfc) 

2: Initialize: Estimate vectors q[ = 0 , V k £ {0,1, 2,..., £}, 

Residual vectors r? = e t and r? = 0 , V k £ {1, 2,3,..., £} 

3: for i e {0,1,..., 4,8.x} do 
4: while 3v G S s.t. r l t [v\ > 5 r do 

5: Execute REVERSE-PUSH(u, i) 

6: end while 

7: end for 

8: Set number of sample paths rif = c£ max cV/<5 (See Theorem 1 for details) 

9: for index ie {1,2,..., n/} do 
10: Sample starting node V^° ~ cr 

11: Generate sample path Tj = {V®, V /,..., TA £max } of length £ max starting from 

12: For £ € {1, 2,... ,£ max }'. sample k ~ Uniform[0,£] and compute S ( t i = £r^~ k [V i k ] 

(We reinterpret the sum over k in Equation 2 as an expectation and sample k rather sum over 
k < £ for computational speed.) 

13: end for 

14: return {p^]}^ max] , where p* [i] = (a,qf) + (l/n f ) EZi S U 


2.2 Some Intuition Behind our Approach 

Before formally analyzing the performance of our MSTP-estimation algorithm, we first build some 
intuition as to why it works. In particular, it is useful to interpret the estimates and residues in 
probabilistic/combinatorial terms. In Figure 1, we have considered a simple Markov chain on three 
states - Solid, Hollow and Checkered (henceforth (S, H, C)). On the right side, we have illustrated 
an intermediate stage of reverse work using S as the target, after performing the REVERSE-PUSH 
operations (S, 0), (H, 1), (C, 1) and (S, 2) in that order. Each push at level i uncovers a collection 




Figure 1: Visualizing a sequence of REVERSE-PUSH operations: Given the Markov chain on the 
left with S as the target, we perform REVERSE-PUSH operations (S, 0), ( II. 1), (C, 1 2). 
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of length-(i + 1) paths terminating at S - for example, in the figure, we have uncovered all length 
2 and 3 paths, and several length 4 paths. The crucial observation is that each uncovered path of 
length i starting from a node v is accounted for in either eg', or r* . In particular, in Figure 1, all paths 
starting at solid nodes are stored in the estimates of the corresponding states, while those starting at 
blurred nodes are stored in the residue. Now we can use this set of pre-discovered paths to boost the 
estimate returned by Monte Carlo trajectories generated starting from the source distribution. The 
dotted line in the figure represents the current reverse-work frontier - it separates the fully uncovered 
neighborhood of (S, 0) from the remaining states (v, i). 

In a sense, what the REVERSE-PUSH operation does is construct a sequence of importance¬ 
sampling weights, which can then be used for Monte Carlo. An important novelty here is that 
the importance-sampling weights are: (i) adapted to the target state, and ( ii ) dynamically adjusted 
to ensure the Monte Carlo estimates have low variance. Viewed in this light, it is easy to see how 
the algorithm can be modified to applications beyond basic MSTP-estimation: for example, to non- 
homogenous Markov chains, or for estimating the probability of hitting a target state t for the first 
time in l steps (cf. Section 3). Essentially, we only need an appropriate reverse-push/dynamic 
programming update for the quantity of interest (with associated invariant, as in Equation (2)). 

2.3 Performance Analysis 

We first formalize the critical invariant introduced in Equation (2): 

Lemma 1. Given a terminal state t, suppose we initialize = 0, = e t and qt, rjj = OVA: > 

0. Then for any source distribution a and length £, after any arbitrary sequence of REVERSE- 
PUSH(v, k) operations, the vectors {q^, r^ } satisfy the invariant: 

i 

pi[t} = (a,c l l}+J2^P k y t - k ) (3) 

k =0 


Proof The proof follows the outline of a similar result in Andersen et al. [4] for PageRank esti¬ 
mation. First, note that Equation (3) can be re-written as p^.[f] = ^<7,qf + '}2k=o r t~ k (P k ) T ^- 
Note that under the initial conditions specified in Algorithm 2, for any a , A, the invariant reduces 
to: pi[t] = ^cr, qf + J2k=o r t~ k (P k ) T ^ = ( a ^t(P T ) e ) = (t7P^,e t ), which is true by definition. 

We now prove the invariant by induction. Suppose at some stage, vectors | qj, i't} satisfy Equa¬ 
tion (3); to complete the proof, we need to show that for any pair (v. i), the RE VER SE-PUSH( t >, i) 
operation preserves the invariant. Let {qjj, } denote the estimate and residual vectors after the 
REVERSE-PUSH(u, i) operation is applied, and define: 

k= Ul+J2^~ k (P k ) T ) - U + i2 r t~ k ( pk ) T ) 

\ fc=0 / \ k =0 / 

Now, to prove the invariant, it is sufficient to show A* = 0 for any choice of (v. i). Clearly this is 
true if £ < i\ this leaves us with two cases: 

• If £ = i we have: A* = (q t * - qj) + (rj - rj) = (r$,e v )e v - (r t fc ,e„)e„ = 0 

• If £ > i, we have: 

At = ( r] +1 - rj +1 + (rl - r \)P T ) (p T ) M 

= (( r t,e„)(e f P r ) - ((rt,e,„)eJP T ) (P T ) = 0 

This completes the proof of the lemma. □ 

Using this result, we can now characterize the accuracy of the Bidirectional-MSTP algorithm: 

Theorem 1. We are given any Markov chain P, source distribution a, terminal state t, maximum 
length A max and also parameters 5,py and e (i.e., the desired threshold, failure probability and 
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relative error). Suppose we choose any reverse threshold 8 r > 8, and set the number of sample- 
paths nf = c8 r /5, where c = max {6e/e 2 ,1/In 2} In (2£ max /p/). Then for any length £ < £ max 
with probability at least 1 — py, the estimate returned by Bidirectional-MSTP satisfies: 

|'PaM-PaM| < max {ep£[*],£} . 


Proof Given any Markov chain P and terminal state t, note first that for a given length t < f max , 
Equation (2) shows that the estimate p^[i] is an unbiased estimator. Now, for any random-trajectory 
Tk, we have that the score Sf k obeys: ( i ) E[S'f fc ] < p e a [t] and (ii) Sf k £ [0,£S r ]', the first in¬ 
equality again follows from Equation (2), while the second follows from the fact that we executed 
REVERSE-PUSH operations until all residual values were less than S r . 

Now consider the rescaled random variable X k = S[ k /(£8 r ) and X = j A'^; then we 

have that X k £ [0,1], E[A] < {ny / £8 r )p f a [t\ and also (A — E[A]) = (rif/£5 r ){p t (J [t\ — p^.[f]). 

Moreover, using standard Chernoff bounds (cf. Theorem 1.1 in [8]), we have that: 


P[|X-E[A]| > e E[X]] < 2exp 


e 2 E[X] \ 

3 ) 


and P[A' > b\ < 2 b for any b > 2eE[X] 


Now we consider two cases: 

1. E[5^ fc ] > 8/ 2e (i.e., E[X] > rif8/2e£8 r = c/2e): Here, we can use the first concentration 
bound to get: 


[\pi[t]-pi[t]\>epi[t}]=V 


\X-E[X]\>^p i\f\ 


< 2 exp — 


2 E[X]\ 

“a - ) 


< 2 exp — — 1, 


< P[|A-E[A]| > eE[X]\ 
e 2 c\ 


where we use that rif = c£ max S r /S (cf. Algorithm 2). Moreover, by the union bound, we have: 


U {|piM-P^]|>epSM} 


t<t 


— 2f max exp I 


32e 


Now as long as c > (6e/e 2 ) In (2 £ max /pf), we get the desired failure probability. 

2. EfS'j k ] < 8/2e (i.e., E[AT] < c/2e): In this case, note first that since X > 0, we have that 
p^.[ij — PctM < (tif/£8 r ) E[A'] < 8/2e < 8. On the other hand, we also have: 


p[p i[t] 


P t[t] > 8] =P 


A - E[A] > 


nj8 

£8 r 


< P [A > c] < 2~ c , 


where the last inequality follows from our second concentration bound, which holds since we 
have c > 2eE[X]. Now as before, we can use the union bound to show that the failure probability 
is bounded by py as long as c > log 2 (£ max /pf). 


Combining the two cases, we see that as long as c > max {6e/e 2 , l/ln2} In (2£ max /p/), then 


have ] 


Ur<^ max {|p£M - Pa M| > max{<5, ep^. [f]}} 


<Pf- 


we 

□ 


One aspect that is not obvious from the intuition in Section 2.2 or the accuracy analysis is if using a 
bidirectional method actually improves the running time of MSTP-estimation. This is addressed by 
the following result, which shows that for typical targets, our algorithm achieves significant speedup: 

Theorem 2. Let any Markov chain P, source distribution a, maximum length £ max and parameters 
8,pf and e be given. Suppose we set 8 r = ^ \ 0 ^(i / p f j- Then for a uniform random choice of 


t £ S, the Bidirectional-MSTP algorithm has a running time of O 
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Proof. The runtime of Algorithm 2 consists of two parts: 

Forward-work (i.e., for generating trajectories): we generate n/ = c£ max S r /8 sample trajectories, 
each of length £ max - hence the running time is O (c<Wjj- lax /(5) for any Markov chain P, source 
distribution <1 and target node t. Substituting for c from Theorem 1, we get that the forward-work 

running time Tf = O ^ e ™°~ Sr lo J,^ max ^ Pf ^ ^ . Reverse-work (i.e., for REVERSE-PUSH operations): 
Let T r denote the reverse-work runtime for a uniform random choice oft£S. Then we have: 

^ ^max 

E l T r] = H H + ^{REVERSE-PUSH^,fc) is executed} 

£6<S k =0 u(E<S 

Now for a given t £ S and k £ {0,1,... ,£ max }. note that the only states v £ S on which we 
execute RE VERSE-PUS H ( v. k ) are those with residual T t( v ) > Sr - consequently, for these states, 
we have that q} (v) > 8 r , and hence, by Equation (3), we have that [f] > S r (by setting a = e v , 
i.e., starting from state v). Moreover, a REVERSE-PUSH (v. k) operation involves updating the 
residuals for d ln (v) +1 states. Note that Y2tes Pe M = 1 and hence, via a straightforward counting 
argument, we have that for any v £ S, Y2tes ^{p fc [t]>M — 1 /S r - Thus, we have: 

t t 

-i '•max -i 4 max 

E PU i Si £ E = joj E E I>‘ n M + 

££<S k —0 v£S vES k —0 ££<S 

< t +') ■ «'■"(”)+^° Cr ■ - ° (P 3 ) 

Finally, we choose 8 r = yj t log^/ / Pf j to balance Tf and T r and get the result. □ 

3 Applications of MS TP estimation 

• Estimating the Stationary Distribution and Hitting Probabilities: MSTP-estimation can be 
used in two ways to estimate stationary probabilities 7r[t], First, if we know the mixing time T mlx 
of the chain P, we can directly use Algorithm 2 to approximate 7r[i] by setting £ max = T mix and 
using any source distribution a. Theorem 2 then guarantees that we can estimate a stationary 

probability of order S in time O ( t A J^ x yjd/8). In comparison, Monte Carlo has 0{T m i X /5) run¬ 
time. We note that in practice, we usually do not know the mixing time - in such a setting, our 
algorithm can be used to compute an estimate of p e a [t] for all values of /: < /: ln;ix . 

An alternative is to modify Algorithm 2 to estimate the truncated hitting time i.e., the 

probability of hitting t starting from er for the first time in i steps). By setting a = e t , we get 
an estimate for the expected truncated return time E[T t !m<w] = Now ’ 

using that fact that n[t] = l/E[T t ], we can get a lower bound for Tt[t\ which converges to 7r[f] 
as (max —> 00 . We note also that the truncated hitting time has been shown to be useful in other 
applications such as identifying similar documents on a document-word-author graph [9], 

To estimate the truncated hitting time, we modify Algorithm 2 as follows: at each stage i £ 
{1,2,...,£ max } (note: not i = 0), instead of REVERSE-PUSH(t, i), we update qj[t] = qj[i] + 
r }[£], set v\\t] = 0 and do not push back r \[t\ to the in-neighbors oft in the (i + l) th stage. The 
remaining algorithm remains the same. It is easy to see from the discussion in Section 2.2 that the 
resulting quantity is an unbiased estimate of P[Hitting time of t = £\X 0 ~ a] - we omit 

a formal proof due to lack of space. 

• Exact Stationary Probabilities in Strong Doeblin chains: A strong Doeblin chain [10] is ob¬ 
tained by mixing a Markov chain P and a distribution a as follows: at each transition, the pro¬ 
cess proceeds according to P with probability a, else samples a state from a. Doeblin chains are 
widely used in ML applications - special cases include the celebrated PageRank metric [11], vari¬ 
ants such as HITS and SALSA [12], and other algorithms for applications such as ranking [13] and 
structured prediction [14]. An important property of these chains is that if we sample a starting 
node Vo from a and sample a trajectory of length Geometric(a) starting from Vo, then the termi¬ 
nal node is an unbiased sample from the stationary distribution [15]. There are two ways in which 
our algorithm can be used for this purpose: one is to replace the REVERSE-PUSH algorithm with 
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Table 1: Datasets used in experiments 


Dataset 

Type 

# Nodes 

# Edges 

Pokec 

directed 

1.6M 

30.6M 

LiveJournal 

undirected 

4.8M 

69M 

Orkut 

undirected 

3.1M 

117M 

Twitter-2010 

directed 

42M 

1.5B 


a corresponding local update algorithm for the strong Doeblin chain (similar to the one in Ander¬ 
sen et al. [4] for PageRank), and then sample random trajectories of length Georrudric(o). A 
more direct technique is to choose some f max >> 1/a, estimate |p£.[i]} V£ € [/max] and then 
directly compute the stationary distribution as p[t] = c/ -1 (l — a)p^.[f]. 

• Graph Diffusions: If we assign a weight a* to random walks of length i on a (weighted) 

graph, the resulting scoring functions f(P,a)[t] := ^°2. 0 (c T P l ) M are known as a graph 
diffusions [16] and are used in a variety of applications. The case where on = cri -1 (l — a) 
corresponds to PageRank. If instead the length is drawn according to a Poisson distribution 
(i.e., cii = e~ a a l /i\), then the resulting function is called the heat-kernel h(G,a) - this too 
has several applications, including finding communities (clusters) in large networks [5], Note 
that for any function / as defined above, the truncated sum / fmax = g x at (p^P 1 ) obeys 
11/ — / £max ||oo < +i ai ■ Thus a guarantee on an estimate for the truncated sum directly 

translates to a guarantee on the estimate for the diffusion. We can use MSTP-estimation to effi¬ 
ciently estimate these truncated sums. We perform numerical experiments on heat kernel estima¬ 
tion in the next section. 

• Conductance Testing in Graphs: MSTP-estimation is an essential primitive for conductance 
testing in large Markov chains [1], In particular, in regular undirected graphs. Kale et al [6] 
develop a sublinear bidirectional estimator based on counting collisions between walks in order 
to identify ‘weak’ nodes - those which belong to sets with small conductance. Our algorithm can 
be used to extend this process to any graph, including weighted and directed graphs. 

• Local Algorithms: There is a lot of interest recently on local algorithms - those which perform 
computations given only a small neighborhood of a source node [17]. In this regard, we note that 
Bidirectional-MSTP gives a natural local algorithm for MSTP estimation, and thus for the 
applications mentioned above - given a /c-hop neighborhood around the source and target, we can 
perform Bidirectional-MSTP with £ max set to k. The proof of this follows from the fact 
that the invariant in Equation (2) holds after any sequence of REVERSE-PUSH operations. 

4 Experiments 

To demonstrate the efficiency of our algorithm on large Markov chains, we use heat kernel estima¬ 
tion (cf. Section 3) as an example application. The heat kernel is a non-homogenous Markov chain, 
defined as the probability of stopping at the target on a random walk from the source, where the walk 
length is sampled from a Poisson(f) distribution. It is important in practice as it has empirically 
been shown to detect communities well, in the sense that the heat kernel from s to t tends to be 
larger for nodes t sharing a community with s than for other nodes t. Note that for finding complete 
communities around s, previous algorithms which work from a single s to many targets are more 
suitable. Our algorithm is relevant for estimating the heat kernel from s to a small number of targets. 
For example, in the application of personalized search on a social network, if user s is searching for 
nodes t satisfying some query, we might find the top -k results using conventional information re¬ 
trieval methods for some small k, then compute the heat kernel to each of these k targets, to allow 
us to place results t sharing an inferred community with s above less relevant results. A related 
example is if a social network user s is viewing a list of nodes t, say the set of users attending some 
event, our estimator can be used to rank the users t in that list in decreasing order of heat kernel. 
This can help s quickly find the users t closest to them in the network. 

We use our estimator to compute heat kernels on diverse, real-world graphs that range from millions 
to billions of edges as described in Table 1. For each graph, for random (source, target) pairs, we 
compute the heat kernel. We set average walk-length l = 5 since for larger values of £, the walk 
mixes into the stationary distribution, “forgetting” where it started. We set a maximum length of 10 
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Figure 2: Estimating heat kernels: Bidirectional MSTP-estimation vs. Monte Carlo, Forward Push. 
For a fair comparison of running time, we choose parameters such that the mean relative error of all 
three algorithms is around 10%. Notice that our algorithm is 100 times faster per (source, target) 
pair than state-of-the-art algorithms. 


standard deviations above the expectation t + 10 \/1 ss 27; the probability of a walk being longer 
than this is 10 “ 12 , so we can ignore that event. 

We compare Algorithm 2 to two state-of-the-art algorithms: The natural Monte Carlo algorithm, and 
the push forward algorithm introduced in [5], All three have parameters which allow them to trade 
off speed and accuracy. For a fair comparison, we choose parameters such that the mean relative 
error of all three algorithms is around 10%, and for those parameters we measure the mean running 
time of all three algorithms. We implemented all three algorithms in Scala (for the push forward 
algorithm, our implementation follows the code linked from [5]). 

Our results are presented in Figure 4. We find that on these four graphs, our algorithm is lOOx 
faster (per (s,f) pair) than these state-of-the-art algorithms. For example, on the Twitter graph, our 
algorithm can estimate a heat kernel score is 0.1 seconds, while the state-of-the-art algorithms both 
take more than 4 minutes. We note though that the state-of-the-art algorithms were designed for 
community detection, and return scores from the source to all targets, rather than just one target. 
Our algorithm’s advantage applies in applications where we want the score from a source to a single 
target or small set of targets. 

The Pokec [18], Five Journal [19], and Orkut [19] datasets were downloaded from the Stanford 
SNAP project [20]. The Twitter-2010 [21] was downloaded from the Faboratory for Web Algorith- 
mics [22]. For reproducibility, the source code of our experiments are available on our website 3 . 
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